Visualisierung von Beziehungen
Humboldt-Universität zu Berlin
Mi. den 25.10.2023
Last week we learned…
dplyr package from the tidyverse
pipe (|>) to feed the result of one function into another functionfilter(), arrange()
rename(), mutate(), select(), relocate()
dplyr functions with plots from ggplot2
Today we will learn…
Die Pflichtlektüre zur Vorbereitung auf dieses Thema ist Kap. 2 (Datenvisualisierung) aus Abschnitt 2.5 in Wickham et al. (2023).
Eine ergänzende Lektüre ist Ch. 3 (Data visualtion) in Nordmann & DeBruine (2022).
tidyverse family of packages
ggplot2 for plotsdplyr for data wranglingggthemes for colorblind-friendly color palettespatchwork for plot layoutslanguageR for linguistic datasetsI set my preferred ggplot theme globally. This means that after I run this code, the plots will all use this theme.
We will use the english dataset from Baayen & Shafaei-Bajestan (2019).
exp() function to do thisenglish datasetOur variables of interest are:
| variable | description | type |
|---|---|---|
| RTlexdec | Reaction times for a visual lexical decision (milliseconds) | continuous |
| RTnaming | Reaction times for the onset of a verbal word naming task (milliseconds) | continuous |
| WrittenFrequency | numeric vector with log frequency in the CELEX lexical database. | continuous |
| Word | a factor with 2284 words | categorical |
| AgeSubject | a factor with as levels the age group of the subject: young versus old. | categorical |
| WordCategory | a factor with as levels the word categories N (noun) and V (verb). | categorical |
| CV | factor specifying whether the initial phoneme of the word is a consonant (C) or a vowel (V). | categorical |
| CorrectLexdec | numeric vector with the proportion of subjects that accepted the item as a word in lexical decision. | continous |
Task: visualising relationships
Task: visualising relationships in distributions
Abbildung 1: Visualising relationships in distributions
Abbildung 2: Visualising relationships in distributions
english dataset we have the variables WrittenFrequency and RTlexdec
fill or colour
geom_point(), it’s also helpful to use shape
Aufgabe 1: Adding another variable
Beispiel 1
How might you include a fourth variable in the plot above? Try adding CV. Does the plot still tell a clear story?
facet_wrap(), which takes a formula as its argument
~ and the name of a cateogircal variable, e.g., ~CV
ggplot()
dplyr verb?Plot annotation
The point of data visualisation is to communicate something about your data. In order to do this, we need to faciliate the reader’s understanding of what our model is showing. Right now, our plots don’t have a title, and the axis labels correspond to the variable names in our data, which may not be interpretable to an outsider.
All this to say, remember to give useful labels to your plots. Let’s add a title, and x- and y-axis labels. There are several different ways to do this, but I find the cleanest way is to use the labs() ggplot-layer, which takes as its arguments title = "", x = "", and y = "". If you also have other aesthetics (e.g., fill or shape), you can add labels to those to make sure your legend also has a reader-friendly title.
#|
```{r}```
| option | values | function |
|---|---|---|
| #| echo: | true/false | should this code chunk be printed when rendering? |
| #| eval: | true/false | should this code chunk be run when rendering? |
we often want to use our plots in a document that is not created in RStudio
to do this we need to load in our figures as an accepted file type, such as jpeg or png
we can do this with the ggsave() function
Can you guess what types of arguments ggsave() needs in order to save our plots? Some are required, some are optional.
ggsave()At minimum ggsave() takes as its arguments:
ggsave() optional argumetswidth =: how wide you want your plot to be in cm, mm, inches, or pixelsheight =:dpi =: desired resolution (numerical, or a set of strings: “retina” = 320, “print” = 300, or “screen” = 72)bg =: background colour, e.g., “black”eval: falseWarnung
Always set code chunks that save files to your machine to eval: false!!! Otherwise, every time that you run your script, the file will be re-written locally.
Aufgabe 2: ggsave()
Beispiel 2
Hergestellt mit R version 4.3.0 (2023-04-21) (Already Tomorrow) und RStudioversion 2023.9.0.463 (Desert Sunflower).
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] kableExtra_1.3.4 knitr_1.44 languageR_1.5.0 ggthemes_4.2.4
[5] patchwork_1.1.3 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[9] dplyr_1.1.3 purrr_1.0.2 readr_2.1.4 tidyr_1.3.0
[13] tibble_3.2.1 ggplot2_3.4.3 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.3 generics_0.1.3 xml2_1.3.4 stringi_1.7.12
[5] hms_1.1.3 digest_0.6.33 magrittr_2.0.3 evaluate_0.21
[9] grid_4.3.0 timechange_0.2.0 fastmap_1.1.1 jsonlite_1.8.7
[13] httr_1.4.6 rvest_1.0.3 fansi_1.0.4 viridisLite_0.4.2
[17] scales_1.2.1 cli_3.6.1 rlang_1.1.1 munsell_0.5.0
[21] withr_2.5.0 yaml_2.3.7 tools_4.3.0 tzdb_0.4.0
[25] colorspace_2.1-0 webshot_0.5.4 pacman_0.5.1 vctrs_0.6.3
[29] R6_2.5.1 lifecycle_1.0.3 pkgconfig_2.0.3 pillar_1.9.0
[33] gtable_0.3.4 glue_1.6.2 systemfonts_1.0.4 highr_0.10
[37] xfun_0.39 tidyselect_1.2.0 rstudioapi_0.14 farver_2.1.1
[41] htmltools_0.5.5 labeling_0.4.3 rmarkdown_2.22 svglite_2.1.1
[45] compiler_4.3.0
Woche 2 - Datenvisualisierung 2: Beziehungen